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METHODS FOR RENAMING STACK REFERENCES 
IN A COMPUTER PROCESSING SYSTEM 



BACKGROUND 

1. Technical Field 

The present invention generally relates to computer 
processing systems and, in particular, to methods for 
renaming stack references in a computer processing system, 



10 2 . Background Description 

A memory serves as a repository of information in a 
computer processing system. FIG. 1 is a block diagram 
illustrating a typical layout of a memory 10 0 of a computer 
program according to the prior art. The layout consists of 

15 distinct memory areas, including a program text area 104, a 

program data area 106, a heap 108, and a program stack 110. 
Program text area 104 is used to store program text (i.e., 
computer instructions) . Program data area 106 is used to 
store program data (for static data references) . Heap 108 

20 is used for dynamically allocated objects and program stack 

110 is used for function-local variables. 
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As shown, memory 100 stores different types of data in 
distinct memory areas. The following different mechanisms 
are used to access these memories: 

1. Program text area 104 stores the computer 

5 instructions describing the actions of a program, 

and possibly program constants. Program text area 
104 is usually read-only and accessed using the 
program counter. 

2. Program data area 105 holds static data 

10 references, e.g., global program variables. 

Program data area 106 is accessed using either a 
global data pointer or a table of contents data 
structure , 

3. Heap 108 holds dynamically allocated objects and 
15 is accessed using pointers held in any of the 

processor registers. 

4. Program stack 110 usually holds function-local 
variables and is accessed using special-purpose 
registers, such as the stack pointer (SP) , frame 

20 pointer (FP) , or argument pointer (AP) . 



Usually, all program memory can be accessed through 
the use of pointers which are stored in a register. 
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However, the access mechanisms described above are generally 
used for each area in typical programs. 

In general, a processor accesses information from the 
memory, performs computations thereon, and stores the 
5 results back to memory. Unfortunately, memory access incurs 

a number of costs. A description of some of these costs 
will now be given. 

When a memory access operation is first detected, the 
address to be accessed must be resolved. Moreover, the 
10 registers employed for the address computation must be 

available . 

If the processor wants to reorder memory read 
operations with respect to other memory operations, and it 
cannot be determined that the read addresses are different 

15 at the time when they are to be reordered, then checks for 

memory address ambiguities need to be performed. 

In addition, since store operations modify the 
processor state, they typically have to be performed 
in-order. This causes further slowdowns in achievable 

20 processor performance by serializing operations when 

multiple live ranges are assigned to the same memory 
location. Thus, limitations are typically imposed on the 
degree of reordering that can be performed in a superscalar 
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processor, when multiple independent values are assigned to 
the same memory address. 

Moreover, load and store operations typically require 
access to a cache (s). However, accessing a cache is slower 
5 in comparison to accessing processor registers, which 

represent a higher level in the memory hierarchy of a 
computer processing system. 

Many of the serializing effects of memory references 
result from the way in which programs are written by 
10 programmers. However, serializing effects of memory 
references may also result from the way programs are 
translated from their source level representation to the 
actual machine. In such a case, references are made to the 
program stack. 

15 The program stack stores stack frames, that is, 

records containing the values for local variables of 
functions, as well as parameters passed between functions. 
Stack locations are reused frequently, with different 
functions using memory locations with the same address to 

20 store unrelated objects. 

Consider the following example code written in the C 
programming language : 
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int mult3 (int a) 

return a * 3 ; 

int inc (int b) 

return b+1; 

int compute (int a, int b) 

int tmpl, tmp2; 

tmpl = mult3 (a) ; 
tmp2 = inc (b) ; 

return tmpl+tmp2; 



20 



When this code is translated to Intel x86 machine 
code, the following instructions will be generated: 



I mult3 : 

25 2 imull $3 , 4 (%esp) , %eax 

3 ret 
4 

5 inc : 

6 movl 4(%esp),%eax 
30 7 incl %eax 

8 ret 
9 

10 compute; 

II pushl %esi 
35 12 pushl %ebx 

13 movl 12 (%esp) , %eax 

14 movl 16 (%esp) , %ebx 

15 pushl %eax 

16 call mult3 
40 17 addl $4,%esp 
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18 movl %eax,%esi 

19 pushl %ebx 

20 call inc 

21 addl $4,%esp 

22 addl %esi,%eax 

23 popl %ebx 

24 popl %esi 

25 ret 



10 The immediately preceding code illustrates several examples 
of the inefficiencies of holding the processor stack in 
memory : 

1. The values of registers ESI and EBX are stored on 
the stack at instructions 11 and 12, and restored at 

15 instructions 23 and 24. These values could have 

been held in processor- internal registers. 

2 . The parameters a and b which were pushed onto the 
stack by the calling function must be read from 
the stack into a processor register, and then stored 

20 on the stack for functions mult3 and inc, 

respectively. 

3. The parameters a and b for functions mult3 and inc, 
respectively, are stored at the same stack location, 
so operations from function inc cannot be scheduled 

25 at the same time as the instructions for function 

mult3. This serialization is not necessary. 
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The serializing effects of stack references due to the 
reuse of memory locations and the manipulation of the stack 
pointer is described by Postiff et al . , in "The Limits of 
Instruction Level Parallelism in SPEC95 Applications", 

5 International Conference on Architectural Support for 

Programming Languages and Operating Systems (ASPLOS-VIII) 
Workshop on Interaction between Compilers and Computer 
Architectures (INTERACT-3) , October 1998. Postiff et al, 
further describe the performance improvements which can be 

10 achieved by resolving these serializing effects. 



3. Problems with the State of the Art 

It is to be appreciated that previous memory renaming 
has been based on renaming of general memory references, and 

15 has tended to ignore multiprocessor effects. Some of these 

prior art approaches will now be described. 

It is to be appreciated that memory renaming typically 
includes the prediction of data dependencies. A mechanism 
to predict data dependencies dynamically without computation 

20 of the address is described by A. Moshovos and G. Sohi, in 
"Streamlining Inter-operation Memory Communication via Data 
Dependence Prediction", Proceedings of 3 0th Annual 
International Symposium on Microarchitecture Research 
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Triangle Park, NC, December 1997. Predicting dependencies is 
necessary because the addresses of load and store operations 
may be unresolved. To ensure correctness of the 
predictions, these memory operations need to be eventually 

5 executed. A similar approach for predicting dependencies is 

described by G. Tyson and T. Austin, in "Improving the 
Accuracy and Performance of Memory Communication Through 
Renaming", Proceedings of 30th Annual International 
Symposium on Microarchitecture Research, Triangle Park, NC, 

10 December 1997. Moshovos & Sohi and Tyson & Austin provide a 
general technique for promoting accesses to memory into 
processor-internal registers. This requires hardware of 
significant complexity. Moreover, prediction is used, which 
is not as accurate as actual decoding of the instruction, 

15 and may require expensive repair actions. An address 

resolution buffer which supports out-of-order execution of 
memory operations and memory renaming is described by M, 
Franklin and G. Sohi, in "ARB: A Hardware Mechanism for 
Dynamic Reordering of Memory References", IEEE Transactions 

20 on Computers, Vol. 45, No. 5, May 1996, Disadvantageously , 

this buffer is expensive, the required hardware is complex, 
and the buffer does not consider multiprocessor systems and 
their consistency requirements. 
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U.S. Patent No. 5,911,057, entitled "Superscalar 
Microprocessor Having Combined Register and Memory Renaming 
Circuits, Systems, and Methods", issued on June 8, 1999, the 
disclosure of which is incorporated herein by reference, 
5 describes an architecture for renaming memory and register 

operands in uniform fashion. Memory coherence is based on 
"snooping" memory requests. While this approach is 
sufficient for the in-order execution of memory operations 
in a multiprocessor computing system, out-of-order operation 

10 in a multiprocessor system may generate incorrect results. 

U.S. Patent No. 5,838,941, entitled "Out-of-order 
Superscalar Microprocessor with a Renaming Device that Maps 
Instructions from Memory to Registers", issued on November 
17, 1998, the disclosure of which is incorporated herein by 

15 reference, describes symbolic renaming of memory references. 

This invention deals with equivalence of all types, and 
requires lockup of an associative array to establish 
equivalence between expression and names. This results in a 
complex architecture with potentially severe cycle time 

20 impact . 

Thus, it would be desirable and highly 
advantageous to have a method for eliminating serializing 
effects resulting from stack references. It would be 
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further desirable and advantageous if such method was 
applicable in a multiprocessor system, 

SUMMARY OF THE INVENTION 

5 The problems stated above, as well as other related 

problems of the prior art, are solved by the present 
invention, methods for renaming stack references in a 
computer processing system. 

According to a first aspect of the invention, there is 

10 provided a method for renaming memory references to stack 
locations in a computer processing system. The method 
includes the steps of detecting stack references that use 
architecturally defined stack access methods, and replacing 
the stack references with references to processor- internal 

15 registers. 

According to a second aspect of the invention, the 
method further includes the step of synchronizing an 
architected state between the processor-internal registers 
and a main memory of the computer processing system. 

20 According to a third aspect of the invention, the 

method further includes the step of inserting in-order write 
operations for all of the stack references that are write 
stack references. 
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According to a fourth aspect of the invention, the 
method further includes the step of performing a 
consistency-preserving operation for a stack reference that 
does not use the architecturally defined stack access 
5 methods . 

According to a fifth aspect of the invention, the step 
of performing a consistency-preserving operation includes 
the step of bypassing a value from a given 
processor- internal register to a load operation that 
10 references a stack area and that does not use the 

architecturally defined stack access methods. 

According to a sixth aspect of the invention, the 
architecturally defined stack access methods include memory 
accesses that use one of a stack pointer, a frame pointer, 
15 and an argument pointer. 

According to a seventh aspect of the invention, the 
architecturally defined stack access methods include push, 
pop, and other stack manipulation operations. 

According to an eighth aspect of the invention, there 
20 is provided a method for renaming memory references to stack 

locations in a computer processing system. The method 
includes the step of determining whether a load instruction 
references a location in a local stack using an 
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architecturally defined register for accessing a stack 
location. It is determined whether a rename register exists 
for the referenced location in the local stack, when the 
load instruction references the location using the 

5 architecturally defined register. The reference to the 

location is replaced by a reference to the rename register, 
when the rename register exists. 

These and other aspects, features and advantages of 
the present invention will become apparent from the 

10 following detailed description of preferred embodiments, 
which is to be read in connection with the accompanying 
drawings . 

BRIEF DESCRIPTION OF THE DRAWINGS 

15 FIG. 1 is a diagram illustrating a typical memory 

layout of a computer program according to the prior art; 

FIG. 2 is a flow diagram illustrating a method for 
performing the initial processing of a single instruction 
according to an illustrative embodiment of the present 
20 invention; 

FIG. 3 is a flow diagram illustrating the processing 
of load operations according to an illustrative embodiment 
of the present invention; 
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FIG. 4 is a flow diagram illustrating the processing 
of a load operation according to another illustrative 
embodiment of the present invention; 

FIG. 5 is a flow diagram illustrating the processing 
5 of store operations according to an illustrative embodiment 

of the present invention; and 

FIG. 6 is a block diagram illustrating a superscalar 
out-of-order processor to which the present invention is 
applied. 

10 

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

The present invention is directed to methods for 
renaming stack references in a computer processing system. 
According to the invention, the stack references are renamed 

15 to processor-internal registers. By concentrating on the 

frequent rename opportunities for stack references, the 
renaming architecture can be more efficiently implemented 
than prior art approaches to renaming memory references. 

Such efficiency is derived from two aspects. With 

20 respect to the first aspect, the names are easier to 

determine, since references to the processor stack use a 
limited number of addressing registers such as the stack and 
frame pointers. This reduces the possible ambiguities that 
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can arise in the renaming of memory locations using 
different general purpose registers. While accesses using 
other registers are possible, they are sufficiently 
infrequent to be handled using simple disambiguation 
5 techniques . 

With respect to the second aspect, the need to perform 
consistency-preserving operations in a multiprocessor system 
is significantly reduced, since stack references are usually 
referenced only from the local process. Again, references 

10 from other processors are possible, but since they are 

infrequent, they can be resolved using simple techniques. 

It is to be appreciated that by renaming stack 
references to processor registers, stack references become 
amenable to a number of optimizations typically applied to 

15 processor registers, such as reordering of references, 

renaming to resolve anti -dependencies , and speculative 
execution of write operations. 

To facilitate a clear understanding of the present 
invention, definitions of terms employed herein will now be 

20 given. A load instruction refers to any instruction 

performing a memory read-access and (optionally) 
computations based on the loaded value. Thus, a load 
instruction may include, for example, logic, arithmetic and 
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other instructions which employ data from memory locations 
as operands. A store instruction refers to any instruction 
performing a memory write-access and, optionally, 
computations. Out-of-order execution is a technique by 

5 which the operations in a sequential stream of instructions 

are reordered so that operations appearing later are 
executed earlier, if the resources required by the later 
appearing operations are free. Thus, an out-of-order 
instruction may be created, either statically or 

10 dynamically, by moving an instruction from its original 
position in a sequence of instructions to an earlier 
position in the sequence of instructions. 

The following description and corresponding examples 
will be given based on two instructions (unless otherwise 

15 noted) , a first instruction which is executed out-of-order 

before a second, logically preceding instruction, which will 
be termed an in-order instruction. Thus, unless otherwise 
noted, the designation 'in-order' refers only to the 
sequential relationship between the logically preceding 

20 in-order instruction and the first 'out-of-order' 

instruction. It is to be noted that the above two 
instructions (i.e., both the in-order and the out-of-order 
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load instructions) may be in-order or out-of-order with 
respect to a third instruction (and so forth) . 

A general description of the present invention will 
now be provided to introduce the reader to the concepts of 
5 the invention. Subsequently, more detailed descriptions of 

various aspects of the invention will be provided. 

The present invention simplifies the issue of memory 
renaming by renaming only memory stack references (as 
opposed to renaming all memory references) . Such an 
10 approach is advantageous for at least the following reasons: 

1. Performance degradations due to the serializing 
effects of stack references are artificially 
introduced by the translation process and not 
intrinsic in the program. 
15 2. Significant performance gain can be achieved by 

simply renaming the stack references. 

3. The "names" of stack accesses are easy to determine 
since the stack and frame pointers have well-defined 
semantics (unlike other registers which could point 

20 to any location in memory) . 

4. Since stack references are usually local to one 
process and a single processor, and are made through 
the stack or frame pointers, the importance of 
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multiprocessor consistency is reduced, 
5, Detecting naming equivalence is easy because 

references to the stack are usually only through the 
stack or frame pointers {the frame pointer is 
5 usually at a well-defined distance from the stack 

pointer) . 

Consistency between different references to memory is 
resolved by observing that stack references are usually only 
to the local processor stack and only through the registers 

10 specifically designated to manage the stack, such as the 

stack pointer (SP) , the frame pointer (FP) , and an optional 
argument pointer (AP) . 

References to the stack area of one processor by any 
means other than these specifically designated 

15 stack-management registers result in the performing of 

actions to preserve consistency. Such references can be 
easily detected. For example, accesses to the local 
processor stack by a general register (a register other than 
a stack register such as, for example, SP, FP, and AP) can 

20 be detected by tagging page table entries that contain the 

stack, and indicating the condition in the memory unit. The 
processor can then handle this situation in hardware or 
raise an exception and resolve the condition in the 
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exception handler. As another example, in a multiprocessing 
system, accesses to the stack of another processor can be 
detected by not making entries available in the translation 
lookaside buffer (TLB) , thereby taking an exception and 
5 handling the condition in software, or by indicating in the 

TLB that a page is used to store the stack on a remote 
processor. 

FIG. 2 is a flow diagram illustrating a method for 
performing the initial processing of a single instruction 
10 according to an illustrative embodiment of the present 
invention. 

The instruction is fetched from the program memory 100 
(step 210) . It is then determined whether the fetched 
instruction is a memory operation (i.e., whether the fetched 

15 instruction references memory 100) (step 212) . If so, then 

the memory operation is processed in accordance with the 
present invention (step 214) . The particular processing 
steps are shown in detail in FIGs. 3 and 4 for an 
instruction referencing the memory for read access, and in 

20 FIG. 5 for an instruction referencing the memory for write 

access . 

If the fetched instruction is not a memory operation, 
then references to processor registers are renamed (step 
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216) . It is to be appreciated that step 216 is optional 
and, thus, may be omitted if so desired. The instruction is 
then entered in the dispatch table for dispatch to one of 
the execution units (step 218) , and the method is 
terminated. 

FIG. 3 is a flow diagram illustrating the processing 
of a load operation (e.g., step 214 of FIG. 2) according to 
an illustrative embodiment of the present invention. In 
general, if a rename register is found for a memory read 
reference, then the memory read reference is replaced by a 
reference to the rename register. Otherwise, a load from 
the stack location stored in memory is performed. 

According to the method of FIG. 3, it is determined 
whether the load instruction references a location in the 
local stack using the stack pointer SP or frame pointer FP 
(step 310) . With respect to some architectures, step 310 
could optionally determine whether the load instruction 
references any additional pointers used to access a stack 
location, such as the argument pointer AP in the DEC VAK 
architecture. The load instruction does not have to 
reference the SP, FP or AP registers explicitly, but can be 
an instruction such as push or pop (e.g., as found in the 



Y0999-357 (8728-320) 



19 



Intel x86 architecture) which references at least one such 

pointer implicitly. 

If the load instruction references a location in the 

local stack using the stack pointer SP or frame pointer FP, 
5 then the method proceeds to step 320, Otherwise, it is 

determined whether the load instruction references a 

location in a stack using any other register (step 314) . 

Unlike the determination made in step 310, which was 

concerned with only a local stack location, the 
10 determination made in step 314 encompasses both a local or a 

remote stack location. 

Step 314 can be performed during either the decode, 

address generation, or memory access phase. According to an 

illustrative embodiment of the present invention, one 
15 implementation of step 314 may include marking TLB entries 

of pages in stack 110 (see FIG, 1) as containing stack 

references . 

If the load instruction does not reference a location 
in a stack using any other register, a normal load operation 
20 (i.e., a load operation from main memory or a cache) is 

performed and then the method is terminated (step 316) . 
However, if the load instruction does reference a location 
in a stack using any other register, then a 
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consistency-preserving mechanism is executed to perform a 
load operation from the stack area (step 318) . The 
consistency preserving mechanism can be implemented in 
hardware, software, or a combination thereof. Illustrative 
5 embodiments of the consistency preserving mechanism are 

described in detail hereinbelow. 

At step 320, it is determined whether a rename 
register exists for the referenced location in the local 
stack. Step 320 can be performed using either a symbolic 
10 address, i.e., "(SP)+100" for a memory reference with 

displacement 100 from the stack pointer, or by actually 
computing the address of the referenced location. 

If a rename register exists for the referenced 
location in the local stack, then the reference to the stack 
15 location is replaced by a reference to the rename register 

and the method is terminated (step 322) . However, if a 
rename register does not exist for the referenced location 
in the local stack, then a normal load instruction (i.e., a 
load instruction from main memory or a cache) is inserted in 
20 the instruction stream and the method is terminated (step 

324) . 

Restricting memory renaming to stack references 
provides advantages that simplify address resolution with 
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respect to the above prior art approaches for memory 
renaming. Thus, unlike data dependency prediction as used 
by A. Moshovos and G. Sohi, in the above referenced article 
entitled "Streamlining Inter-operation Memory Communication 

5 via Data Dependence Prediction", actual dependencies can be 

determined by decoding the instruction. 

Moreover, unlike the approach for other symbolic 
renaming techniques, symbolic renaming of stack references 
does not require complex logic to determine equivalencies, 

10 such as that described in U.S. Patent No, 5,838,941, 

entitled "Out-of-order Superscalar Microprocessor with a 
Renaming Device that Maps Instructions from Memory to 
Registers", issued on November 17, 1998, the disclosure of 
which is incorporated herein by reference. With respect to 

15 the approach of the present invention, all references are 

through one of the stack registers which are at an easily 
determinable offset from each other. 

It is to be appreciated that the method of FIG, 3 
requires adjustment of references as the stack pointer value 

20 is changed (e.g., on subroutine calls). However, such 

adjustments may be readily made by one of ordinary skill in 
the related art. 
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Using absolute addresses is also simplified, since the 
values of the stack pointer and frame pointer are easy to 
determine and change infrequently. In particular, changes 
to these registers involve simple arithmetic, which can be 
performed speculatively with a shadowed stack pointer used 
only for renaming in the stack reference rename logic. 

FIG. 4 is a flow diagram illustrating the processing 
of a load operation (e.g., step 214 in FIG. 2) according to 
another illustrative embodiment of the present invention. 
In general, if a rename register is found for a stack 
location, then the memory reference is replaced by a 
reference to the rename register. Otherwise, a new rename 
register is allocated, an instruction to load the value from 
the stack to the rename register is inserted into the 
instruction stream, and the original reference to the memory 
location is replaced by a reference to the newly allocated 
rename register. 

According to the method of FIG. 4, it is determined 
whether the load instruction references a location in the 
local stack using the stack pointer SP or frame pointer FP 
(step 410). With respect to some architectures, step 410 
could optionally determine whether the load instruction 
■ references any additional pointers used to access a stack 
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location, such as the argument pointer AP in the DEC VAX 
architecture. The load instruction does not have to 
reference the SP, FP or AP registers explicitly, but can be 
an instruction such as push or pop (e.g., as found in the 

5 Intel x86 architecture) which references at least one such 

pointer implicitly. 

If the load instruction references a location in the 
local stack using the stack pointer SP or frame pointer FP, 
then the method proceeds to step 420. Otherwise, it is 

10 determined whether the load instruction references a 

location in a stack using any other register (step 414) . 
Step 414 can be performed during either the decode, address 
generation, or memory access phase. According to an 
illustrative embodiment of the present invention, one 

15 implementation of step 414 may include marking TLB entries 

of pages in the stack memory area (see FIG. 1) as containing 
stack references. 

If the load instruction does not reference a location 
in a stack using any other register, then a normal load 

20 operation (i.e., a load operation from main memory or a 

cache) is performed and the method is terminated (step 416) . 
However, if the load instruction does reference a location 
in a stack using any other register, then a 

Y0999-357 (8728-320) 24 



consistency-preserving mechanism is executed to perform a 
load operation from the stack area (step 418) . The 
consistency preserving mechanism can be implemented in 
hardware, software, or a combination thereof. Illustrative 
embodiments of the consistency preserving mechanism are 
described in detail hereinbelow. 

At step 420, it is determined whether a rename 
register exists for the referenced location. This test can 
be performed using either a symbolic address, i.e., 
"{SP)+100" for a memory reference with displacement 100 from 
the stack pointer, or by actually computing the address of 
the referenced location. Restricting memory renaming to 
stack references provided advantages as described above with 
respect to FIG. 3, 

Irrespective of the naming technique used, if a rename 
register exists for the referenced location, then the 
reference to the stack location is replaced by a reference 
to the rename register and the method is terminated (step 
422) . However, if a rename register does not exist for the 
referenced location, then a rename register is allocated for 
the stack location referenced by the load operation (step 
424) , Then, a load instruction is inserted in the 
instruction stream to load the value from the processor into 
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the newly allocated rename register (step 426) , and the 
method returns to step 422. 

FIG. 5 is a flow diagram illustrating the processing 
of a store operation (e.g., in unit 62 0 of FIG. 6) according 
5 to an illustrative embodiment of the present invention. 

It is determined whether the store instruction 
references a location in the local stack using the stack 
pointer SP or frame pointer FP (step 510) . With respect to 
some architectures, step 510 could optionally determine 
10 whether the store instruction references any additional 
pointers used to access a stack location, such as the 
argument pointer AP in the DEC VAX architecture. The store 
instruction does not have to reference the SP, FP or AP 
registers explicitly, but can be an instruction such as push 
15 or pop (e.g., as found in the Intel x86 architecture) which 

references at least one such pointer implicitly. 

If the store instruction references a location in the 
local stack using the stack pointer SP or frame pointer FP, 
then the method proceeds to step 520. Otherwise, it is 
20 determined whether the store instruction references a 

location in a stack using any other register (step 514) . 
Unlike the determination made in step 510 which was 
concerned with only a local stack location, the 

Y0999-357 (8728-320) 26 



determination made in step 514 encompasses both a local or a 
remote stack location. Step 514 can be performed during 
either the decode, address generation, or memory access 
phase. According to an illustrative embodiment of the 

5 present invention, one implementation of step 514 may- 
include marking TLB entries of pages in the stack memory 
a3^ea (see FIG. 1) as containing stack references. 

If the store instruction does not reference a location 
in a stack using any other register, then a normal store 

10 operation (i.e., a store operation from main memory or a 

cache) is performed and the method is terminated (step 516) . 
However, if the instruction does reference a location in a 
stack using any other register, then a 

consistency-preserving mechanism is executed to perform a 
15 store operation to the stack area (step 518) . The 

consistency preserving mechanism can be implemented in 
hardware, software, or a combination thereof. Illustrative 
embodiments of the consistency preserving mechanism are 
described in detail hereinbelow. 
20 At step 520, a new rename register is allocated for 

the stack location referenced by the store operation. The 
rename register can be named using either a symbolic 
address, i.e., "(SP)4-100" for a memory reference with 
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displacement 100 from the stack pointer, or by actually 
computing the address. Restricting memory renaming to stack 
references provides advantages as described above with 
respect to FIG. 3. 

5 The memory reference is replaced by a reference to the 

newly allocated rename register {step 522) . An instruction 
to store the value from the rename register to memory is 
inserted into the instruction stream (step 524), and the 
method is terminated. It is to be appreciated that step 524 

10 is optional and, thus, may be omitted if so desired. 

A conventional implementation of a processor capable 
of dynamically scheduling instructions (an out-of-order 
issue processor) includes the following features: 

1. A mechanism for issuing instructions out-of-order, 
15 which includes the ability to detect dependencies 

among the instructions, rename the registers used by 
an instruction, and detect the availability of the 
resources used by an instruction; 

2. A mechanism for maintaining the out-of-order state 
20 of the processor, which reflects the effects of 

instructions as they are executed (out-of-order) ; 

3. A mechanism for retiring instructions in program 
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order, and simultaneously updating the in-order 
state with the effects of the instructions being 
retired; and 

4. A mechanism for retiring an instruction in program 
order without updating the in-order state 
(effectively canceling the effects of the 
instruction being retired) , and for resuming 
in-order execution of the program starting at the 
instruction being retired (which implies canceling 
all the effects present in the out-of-order state) . 
Mechanism 3 from the list above is used to retire 
instructions when the effects of the instructions being 
retired are correct. Alternatively, mechanism 4 is used 
whenever there is some abnormal condition resulting from the 
execution of the instruction being retired or from some 
external event. 

FIG. 6 is a functional block diagram of a conventional 
computer processing system (e.g., including a superscalar 
processor) to which the present invention may be applied. 
The system of FIG. 6 supports reordering of memory 
operations using the mechanisms listed above, but excluding 
the ability to rename and execute references to the program 
stack out-of-order. The system consists of: a memory 
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subsystem 601; a data cache 602; an instruction cache 604; 
and a processor unit 600. The processor unit 600 includes: 
an instruction queue 603; several memory units (MUs) 605 for 
performing load and store operations; several functional 

5 units (FUs) 607 for performing integer, logic and 

floating-point operations; a branch unit (BU) 609; a 
register file 611; a register map table 620; a 
free-registers queue 622; a dispatch table 624; a retirement 
queue 626; and an in-order map table 628. This exemplary 

10 organization is based on the one described in the article by 
M. Moudgill, K. Pingali, and S, Vassiliadis, "Register 
Renaming and Dynamic Speculation: An Alternative Approach", 
Proceedings of the 2 6th Annual International Symposium on 
Microarchitecture, pp. 202-13 (December 1993) . 

15 In the processor depicted in FIG. 6, instructions are 

fetched from instruction cache 604 (or from memory subsystem 
601, when the instructions are not in instruction cache 604) 
under the control of branch unit 609, placed in instruction 
queue 603, and subsequently dispatched from instruction 

20 queue 603. The register names used by the instructions for 

specifying operands are renamed according to the contents of 
register map table 62 0, which specifies the current mapping 
from architected register names to physical registers. The 

Y0999-357 (8728-320) 30 



architected register names used by the instructions for 
specifying the destinations for the results are assigned 
physical registers extracted from free-registers queue 622, 
which contains the names of physical registers not currently 

5 being used by the processor. The register map table 620 is 

updated with the assignments of physical registers to the 
architected destination register names specified by the 
instructions. Instructions with all their registers renamed 
are placed in dispatch table 624. Instructions are also 

10 placed in retirement queue 626, in program order, including 

their addresses, and their physical and architected register 
names. Instructions are dispatched from dispatch table 624 
when all the resources to be used by such instructions are 
available (physical registers have been assigned the 

15 expected operands, and functional units are free) . The 

operands used by the instruction are read from register file 
611, which typically includes general -purpose registers 
(GPRS) , floating-point registers (FPRs) , and condition 
registers (CRs) . Instructions are executed, potentially 

20 out-of-order, in a corresponding memory unit 605, functional 
unit 607 or branch unit 609, Upon completion of execution, 
the results from the instructions are placed in register 
file 611, Instructions in dispatch table 624 waiting for 
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the physical registers set by the instructions completing 
execution are notified. The retirement queue 626 is 
notified of the instructions completing execution, including 
whether they raised any exceptions. Completed instructions 

5 are removed from retirement queue 626, in program order 

(from the head of the queue) . At retirement time, if no 
exceptions were raised by an instruction, then in-order map 
table 628 is updated so that architected register names 
point to the physical registers in register file 611 

10 containing the results from the instruction being retired; 

the previous register names from in-order map table 628 are 
returned to free-registers queue 622, 

On the other hand, if an instruction has raised an 
exception, then program control is set to the address of the 

15 instruction being retired from retirement queue 626. 

Moreover, retirement queue 626 is cleared (flushed) , thus 
canceling all unretired instructions. Further, the register 
map table 620 is set to the contents of in-order map table 
628, and any register not in in-order map table 628 is added 

20 to free-registers queue 622. 

In addition to the components above, superscalar 
processors may contain other components such as 
branch-history tables to predict the outcome of branches, 
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According to the present invention, a conventional 
superscalar processor that supports reordering of load 
instructions with respect to preceding load instructions (as 
shown in FIG. 6) may be augmented with the following: 

A. A first mechanism for detecting load and store 
operations (explicit load or store operations, or 
any other operations which reference memory using 
the stack or frame pointers, as typically found in 
CISC instruction sets) . 

B. A second mechanism for converting a load or store 
operation into a reference to an internal register. 

C. A third mechanism to detect whether any other load 
instruction (which does not address memory through 
the stack and frame pointers) refers to the 
processor stack, and providing the value to the 
instruction. 

D. A fourth mechanism for detecting a memory reference 
to another processor's stack area, and retrieve the 
value . 

In addition, mechanism 4 above is preferably extended 
to force all stack references currently stored in rename 
registers to processor memory. This is particularly simple 
if, in step 524 of FIG. 5, memory write instructions are 
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entered into the instruction stream to record all stack 
references in processor memory in-order. 

The mechanisms provided by this invention are used in 
conjunction with the mechanisms available in the 
5 conventional out-of-qrder processor depicted in FIG. 6, as 

follows. The first mechanism, which detects memory 
operations, is implemented in the register map table 620. 
If the memory operation refers to a stack reference, it is 
converted into a reference to a rename register using the 
10 second mechanism. Rename registers for stack references may 

be shared with those used for general purpose registers, or 
they may be implemented as distinct processor resources. 

The third and fourth mechanisms are implemented in 
memory units 605 and in system software following a hybrid 
15 organization. 

The third mechanism for the detection of interference 
between general -register based memory accesses and local 
stack references is preferably performed by marking TLB 
entries referring to the processor stack. In this 
20 embodiment, references to such pages using a general 

register cause the processor to discard all speculative 
state and perform the operation in-order. According to this 
embodiment, the in-order values of stack references have 
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been stored to the processor memory by store instructions 
which were inserted in program order (step 524 of FIG. 5) . 
Processing then continues with the instruction following the 
current instruction . 

The fourth mechanism is implemented by not loading 
page translations for memory locations which are used as 
program stack in remote processors. As a result, a 
reference to a stack location on a remote processor raises 
an exception, whereupon the system software can synchronize 
the memory read request from the remote processor's stack 
area with that processor. 

According to one illustrative embodiment of the 
present invention, this is performed in software by sending 
a request to the operating system kernel executing on the 
remote processor. According to another illustrative 
embodiment of the present invention, this request and the 
necessary actions to ensure consistency are performed in 
hardware . 

A brief description of the consistency preserving 
mechanism of steps 318, 418, and 518 of FIGs. 3, 4, and 5, 
respectively, will now be given. In-order store operations 
may be performed {at optional step 524 of FIG. 5) by 
performing a load operation from the corresponding address. 
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If in-order store operations are not performed for stack 
references, then one implementation of a 
consistency-preserving mechanism forces all stack rename 
registers to its corresponding stack locations. This can be 
implemented as either a distinct mechanism, or by activating 
mechanism 4 of FIG. 6 which discards all out-of-order 
operations and has been extended to preferably for all stack 
rename locations to memory. In another embodiment, load 
references to the local processor stack may also be bypassed 
directly from the rename registers. 

If multiple values are present, then the last value 
written before the current load operation must be retrieved 
in steps 318 and 418 of FIGs . 3 and 4, respectively. One 
way to do this is to discard all instructions following the 
present load operation. 

Consistency-preserving mechanisms for store operations 
can include discarding all out-of-order state by activating 
mechanism 4 of FIG. Mechanism 4 discards all out-of-order 
operations and can be preferably extended to all stack 
rename locations to memory. When execution restarts, the 
modified value will loaded from memory by step 316 of FIG. 3 
or step 426 of FIG. 4. 
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Special care needs to be taken if stack load and store 
operations refer to data types of different sizes. If the 
load operation refers to a subset of data bits provided by 
the store operation, then these can be extracted in a simple 
5 manner. If the data referred to by a load operation is 

resident in one or more stack rename registers and possibly 
processor main memory, then more complex data gathering is 
required. 

According to one embodiment of the present invention, 
10 at least all overlapping registers are written to main 

memory and a memory read operation is performed. This is 
particularly simple if, in step 524 of FIG. 5, memory write 
instructions are entered into the instruction stream to 
record all stack references in processor memory in-order. 
15 According to another embodiment of the present 

invention, the processor discards all speculative state, 
forces all stack references to memory, and performs the 
memory load operation in-order. Again, this is particularly 
simple if, in step 524 of FIG. 5, memory write instructions 
20 are entered into the instruction stream to record all stack 

references in processor memory in-order. 

When the processor rename mechanism runs out of rename 
registers, rename registers which have no references in 
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pending instructions may be reclaimed and their contents 
written to the processor memory. This may require that the 
value stored by the rename register be written to memory if 
no in-order stores have been inserted in step 524 of FIG 5. 

5 In an optimized embodiment, a predictor is used to 

decide which stack references to rename into processor 
registers in step 424 of FIG. 4 and step 520 of FIG. 5 to 
reduce the number of registers allocated to stack 
references. Stack references not allocated in a processor 

10 register are performed by using load and store instructions 

into the main memory. 

According to one embodiment of the invention, rename 
registers can be shared for processor register and stack 
reference renaming. According to another embodiment of the 

15 invention, they are separate resources. 

According to an optimized embodiment, renaming of 
stack references is combined with the elimination of copy 
operations, thereby reducing the critical path. 

According to one embodiment, stack references are 

20 named using symbolic names of the for "stack pointer value + 
displacement". In this case, synchronization of names is 
required when the stack pointer contents are changed. Also, 
the distance between frame pointer and stack pointer is used 
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to translate FP-relative references to SP-relative 
references . 

According to another embodiment, addresses (e.g., 
effective, virtual or physical) are used to name the stack 
pointer. In an optimized embodiment, the stack pointer (and 
optionally, frame pointer, argument pointer, etc.) are 
shadowed in the rename unit, and speculatively adjusted in 
synchronization with the instruction stream being fetched. 
This reduces the number of memory ports necessary to the 
register file and allows more aggressive speculation during 
renaming . 

According to one embodiment, consistency-preserving 
operations for accesses to the local stack are implemented 
in hardware by associating each renamed stack reference with 
address information and performing a lookup of renamed stack 
references to determine whether the requested data is in a 
rename register or in an actual memory location. According 
to another embodiment, this is implemented using software 
only: a reference to a stack area using a general register 
causes an exception, and software synchronizes the renamed 
values with the present request. 

Although the illustrative embodiments have been 
described herein with reference to the accompanying 
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drawings, it is to be understood that the present system and 
method is not limited to those precise embodiments, and that 
various other changes and modifications may be affected 
therein by one skilled in the art without departing from the 
scope or spirit of the invention. All such changes and 
modifications are intended to be included within the scope 
of the invention as defined by the appended claims. 



Y0999-357 (8728-320) 



40 



WHAT IS CLAIMED IS; 

1. A method for renaming memory references to stack 
locations in a computer processing system, comprising the 

5 steps of: 

detecting stack references that use architecturally 
defined stack access methods; and 

replacing the stack references with references to 
processor-internal registers. 

10 

2. The method according to claim 1, further 
comprising the step of synchronizing an architected state 
between the processor- internal registers and a main memory 
of the computer processing system. 

15 

3. The method according to claim 2, wherein said 
synchronizing step comprises the step of inserting in-order 
write operations for all of the stack references that are 
write stack references. 

20 

4. The method according to claim 1, further 
comprising the step of performing a consistency-preserving 
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operation for a stack reference that does not use the 
architecturally defined stack access methods. 

5. The method according to claim 4, wherein said step 
5 of performing a consistency-preserving operation comprises 

the step of bypassing a value ' from a given 
processor- internal register to a load operation that 
references a stack area and that does not use the 
architecturally defined stack access methods. 

10 

6. The method according to claim 4, further 
comprising the step of synchronizing an architected state 
between the processor- internal registers and a main memory 
of the computer processing system, and wherein said step of 

15 performing a consistency-preserving operation comprises the 

step of recovering an in-order value for the stack reference 
from the main memory, upon performing said synchronizing 
step . 

20 7. The method according claim 6, wherein the in-order 

value is written to the main memory by an in-order write 
operation inserted into an instruction stream containing an 
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instruction corresponding to the stack reference, when the 
stack reference is a write stack reference. 

8. The method according to claim 6, further 
comprising the step of writing the in-order value to the 
main memory in response to a load operation that does not 
use the architecturally defined stack access methods. 

9. The method according to claim 4, wherein said step 
of performing a consistency-preserving operation comprises 
the steps of: 

discarding all out-of-order state of a processor the 
system; 

synchronizing an architected state between the 
processor- internal registers and a main memory of the 
computer processing system; and 

restarting execution after a store operation has been 
performed that does not use the architecturally defined 
stack access methods. 

10. The method according to claim 1, wherein the 
architecturally defined stack access methods comprise memory 
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accesses that use at least one of a stack pointer, a frame 
pointer, and an argument pointer. 



11. The method according to claim 1, wherein the 

5 architecturally defined stack access methods comprise push, 

pop, and other stack manipulation operations. 

12 . A method for renaming memory references to stack 
locations in a computer processing system, comprising the 

10 steps of: 

determining whether a load instruction references a 
location in a local stack using an architecturally defined 
register for accessing a stack location; 

determining whether a rename register exists for the 
15 referenced location in the local stack, when the load 

instruction references the location using the 
architecturally defined register; and 

replacing the reference to the location by a reference 
to the rename register, when the rename register exists. 

20 

13. The method according to claim 12, wherein the 
architecturally defined register corresponds to a pointer 
for accessing stack locations, 
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14. The method according to claim 13, wherein the 
pointer for accessing the stack locations is one of a stack 
pointer, a frame pointer, and an argument pointer. 

5 

15. The method according to claim 12, wherein the 
architecturally defined stack access methods comprise push, 
pop, and other stack manipulation operations. 

10 16. The method according to claim 12, wherein said 

step of determining whether the renaming register exists 
comprises the step of computing one of a symbolic address 
and an actual address of the location, 

15 17. The method according to claim 12, wherein said 

step of determining whether the rename register exists is 
performed during one of a decode, an address generation, and 
a memory access phase. 

20 18. The method according to claim 12, further 

comprising the step of performing the load instruction from 
one of a main memory and a cache of the system, when the 
rename register does not exist. 
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19. The method according to claim 12, further 
comprising the step of determining whether the load 
instruction references a location in any stack, including 
the local stack, using another register, when the load 
instruction does not reference the location using the 
architecturally defined register. 

20. The method according to claim 19, wherein said 
step of determining whether the load instruction references 
the location using the other register comprises the step of 
marking translation lookaside buffer (TLB) entries of pages 
in the local stack as containing stack references, when the 
load instruction references the location using the other 
register. 

21. The method according to claim 19, further 
comprising the step of performing the load instruction from 
one of a main memory and a cache of the system, when the 
load instruction does not reference the location using the 
other register. 
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22. The method according to claim 19, further 
comprising the step of executing a consistency-preserving 
mechanism to perform the load instruction from the stack 
area, when the load instruction references the location 
5 using the other register. 



23. The method according to claim 12, further 
comprising the step of allocating a rename register for the 
location, when the rename register does not exist. 

10 

24. The method according to claim 23, further 
comprising the step of inserting an operation, into an 
instruction stream containing the load instruction, to load 
the location from a processor of the system to the allocated 

15 rename register, upon allocating the rename register. 

25. The method according to claim 24, further 
comprising the step of replacing the reference to the 
location by a reference to the allocated rename register, 

20 upon inserting the operation. 
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26. A method for renaming memory references to stack 
locations in a computer processing system, comprising the 
steps of: 

determining whether a store instruction references a 
5 location in a local stack using an architecturally defined 

register for accessing a stack location; 

allocating a rename register for the location, when 
the store instruction references the location using the 
architecturally defined register; and 
10 replacing the reference to the location by a reference 

to the rename register. 

27. The method according to claim 26, further 
comprising the step of inserting an operation, into an 

15 instruction stream containing the store instruction, to 

store the location from the rename register to a main memory 
of the system, upon replacing the reference to the location 
by the reference to the rename register. 

20 28. The method according to claim 26, further 

comprising the step of determining whether the store 
instruction references a location in any stack, including 
the local stack, using another register, when the store 
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instruction does not reference the location using the 
architecturally defined register. 

29. The method according to claim 28, further 
5 comprising the step of performing the store instruction from 

one of a main memory and a cache of the system, when the 
store instruction does not reference the location using the 
other register. 

10 30. The method according to claim 28, further 

comprising the step of executing a consistency-preserving 
mechanism to perform the store instruction to the stack 
area, when the store instruction references the location 
using the other register. 

15 

31. A program storage device readable by machine, 
tangibly embodying a program of instructions executable by 
the machine to perform method steps for renaming memory 
references to stack locations in a computer processing 
20 system, the method steps comprising: 

detecting stack references that use architecturally 
defined stack access methods; and 
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replacing the stack references with references to 
processor-internal registers . 

32. The program storage device according to claim 31, 
5 further comprising the step of synchronizing an architected 

state between the processor- internal registers and a main 
memory of the computer processing system, 

33. The program storage device according to claim 32, 
10 wherein said synchronizing step comprises the step of 

inserting in-order write operations for all of the stack 
references that are write stack references. 



34, The program storage device program storage device 
15 according to claim 31, further comprising the step of 

performing a consistency-preserving operation for a stack 
reference that does not use the architecturally defined 
stack access methods . 

20 35. The program storage device according to claim 34, 

wherein said step of performing a consistency-preserving 
operation comprises the step of bypassing a value from a 
given processor- internal register to a load operation that 

Y0999-357 (8728-320) 50 



references a stack area and that does not use the 
architecturally defined stack access methods. 

36, The program storage device according to claim 34, 
5 further comprising the step of synchronizing an architected 

state between the processor- internal registers and a main 
memory of the computer processing system, and wherein said 
step of performing a consistency-preserving operation 
comprises the step of recovering an in-order value for the 
10 stack reference from the main memory, upon performing said 

synchronizing step. 

37. The program storage device according claim 36, 
wherein the in-order value is written to the main memory by 

15 an in-order write operation inserted into an instruction 

stream containing an instruction corresponding to the stack 
reference, when the stack reference is a write stack 
reference . 

20 38. The program storage device according to claim 36, 

further comprising the step of writing the in-order value to 
the main memory in response to a load operation that does 
not use the architecturally defined stack access methods. 
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39. The program storage device according to claim 34, 
wherein said step of performing a consistency-preserving 
operation comprises the steps of: 
5 discarding all out-of-order state of a processor the 

system; 

synchronizing an architected state between the 
processor- internal registers and a main memory of the 
computer processing system; and 
10 restarting execution after a store operation has been 

performed that does not use the architecturally defined 
stack access methods. 
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METHODS FOR RENAMING STACK REFERENCES 
IN A COMPUTER PROCESSING SYSTEM 

ABSTRACT OF THE INVENTION 
5 According to one aspect of the invention, there is 

provided a method for renaming memory references to stack 
locations in a computer processing system. The method 
includes the steps of detecting stack references that use 
architecturally defined stack access methods, and replacing 

10 the stack references with references to processor- internal 

registers. The architecturally defined stack access methods 
include memory accesses that use one of a stack pointer, a 
frame pointer, and an argument pointer. Moreover, the 
architecturally defined stack access methods include push, 

15 pop, and other stack manipulation operations. 
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